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Abstract: Investigating the relation between the structure and behavior of 
complex biological networks often involves posing the following two questions: 
Is a hypothesized structure of a regulatory network consistent with the observed 
behavior? And can a proposed structure generate a desired behavior? Answer- 
ing these questions presupposes that we are able to test the compatibility of 
network structure and behavior. 

We cast these questions into a parameter search problem for qualitative 
models of regulatory networks, in particular piecewise-affine differential equa- 
tion models. We develop a method based on symbolic model checking that 
avoids enumerating all possible parametrizations, and show that this method 
performs well on real biological problems, using the IRMA synthetic network 
and benchmark experimental data sets. We test the consistency between the 
IRMA network structure and the time-series data, and search for parameter 
modifications that would improve the robustness of the external control of the 
system behavior. 

GNA and the IRMA model are available at http://www-helix.inrialpes.fr/gna 

Key-words: systems and synthetic biology, qualitative models of gene net- 
works, symbolic model checking, model validation and system design 
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Parametrisation efRcace de modeles qualitatifs de 
reseaux de regulation a I'aide de model checking 

symbolique 

Resume : L 'etude du lien entre structure et fonctionnement de reseaux bi- 
ologiques complexes revient souvent a poser I'une des deux questions suivantes: 
Est ce qu'une structure hypothetique est coherente avec des comportements 
observes? Est-ce qu'un coniportement desire peut etre obtenu avec une struc- 
ture proposee? Nous formulons ce probleme en un probleme de recherche de 
parame-tres pour des modeles qualitatifs de reseaux de regulations. 

Nous developpons une methode basee sur des techniques de model check- 
ing symbolique qui evite de devoir enumerer toutes les parametrisations pos- 
sibles, et nous montrons que cette methode est bien adaptee aux problemes 
biologiques reels a I'aide du reseau synthetique IRMA et des donnees experi- 
mentales de reference associees. Nous testons la coherence entre la structure 
du reseau IRMA et les donnees de series temporelles, et nous cherchons des 
modifications des parametres qui rendent le systeme plus robuste a un controle 
externe par addition de galactose. 

GNA et le modele IRMA sont disponibles en ligne a I'adresse http://www- 
helix.inrialpes.fr/gna 

Mots-cles : biologic systemique et synthetique, modeles qualitatifs de 

reseaux de genes, model checking symbolique, validation de modeles et con- 
ception de reseaux 
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1 Introduction 

A central problem in the analysis of biological regulatory networks concerns the 
relation between their structure and dynamics. This problem can be narrowed 
down to the following two questions: (a) Is a hypothesized structure of the 
network consistent with the observed behavior? (b) Can a proposed structure 
generate a desired behavior? 

Qualitative models of regulatory networks, such as (synchronous or asyn- 
chronous) Boolean models and piecewise-affine differential equation (PADE) 
models, have been proven useful for answering the above questions. The models 
are coarse-grained, in the sense that they do not specify the biochemical mech- 
anisms in detail. However, they include the logic of gene regulation and allow 
different expression levels of the genes to be distinguished. They are interesting 
in their own right, as a way to capture in a simple manner even complex dy- 
namics induced by the structure of interactions, for example steady states and 
transient responses to external perturbations. They can also be used as a first 
step to orient the development of more fine-grained quantitative ODE models. 
Several applications of logical and PADE models have confirmed their interest 
for the study of large and complex regulatory networks [HI HSl HH US] . 

Qualitative models bring specific advantages over numerical models when 
studying the relation between structure and dynamics. In order to answer ques- 
tions (a) and (b), one has to search the parameter space to check if for some 
parameter values the network can be consistent with the data or a desired con- 
trol objective can be attained. In qualitative models the number of different 
parametrizations is finite and the number of possible values for each parame- 
ter is usually rather low. This makes parameter search easier to handle than 
in quantitative models, where exhaustive search of the continuous parameter 
space is in general not feasible. Moreover, much of the available data in biol- 
ogy is semi-quantitative rather than fully quantitative due to variability in the 
experimental conditions and biological material, imprecise and relative measure- 
ments, low sampling density, ... Qualitative models are more concerned with 
qualitative trends in the data rather than with precise quantitative values. 

Nevertheless, the parametrization of qualitative models remains a complex 
problem. For large models, the state and parameter spaces are usually too large 
to test all combinations of parameter values using existing techniques. This 
makes it difficult to answer questions (a) and (b) for most networks of actual 
biological interest. The aim of this paper is to address this search problem for 
PADE models by treating it in the context of formal verification and symbolic 
model checking. More specifically, we formulate the dynamic properties in tem- 
poral logic and verify by means of a model checker if the network satisfies these 
properties jSl E] . 

Our contributions are twofold. On the methodological side, we develop a 
method that in comparison with our previous work [3] makes it possible to 
analyze very efficiently models with a large state space, and even to analyze 
incompletely parametrized models without the need for exhaustive enumeration 
of all parametrizations. This is achieved by a symbolic encoding of the model 
structure, the constraints on parameter values (if available), and the transition 
rules describing the qualitative dynamics of the PADE models. We can thus 
take full advantage of symbolic model checkers for testing the consistency of the 
network structure with dynamic properties expressed in temporal logics. The 
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current version 8 of GNA [5T] has been extended with export functionaUties to 
generate the symbohc encoding of PADE models in the NuSMV language [3- In 
comparison with related work [I] Ul [51 115| , our method applies to incompletely 
instead of fully parametrized models, provides more precise results, and the 
encoding is efficient without (strongly) simplifying the PADE dynamics. 

On the application side, we show that the method performs well on real 
problems, by means of the IRMA synthetic network and benchmark experimen- 
tal data sets [S] . More precisely, we are able to find parameter values for which 
the network satisfies temporal-logic properties describing observed expression 
profiles, both on the level of individual and averaged time-series. The method 
is selective in the sense that only a small part of the parameter space is found 
to be compatible with the observations. Analysis of these parameter values re- 
veals that biologically-relevant constraints have been identified. Moreover, we 
make suggestions to improve the robustness of the external control of the IRMA 
behavior by proposing a rewiring of the network. 

2 Qualitative model of IRMA network 

2.1 IRMA network 

IRMA is a synthetic network constructed in yeast and proposed as a benchmark 
for modeling and identification approaches [S]. The network consists of five well- 
characterized genes that have been chosen so as to include different kinds of 
interactions, notably transcription regulation and protein-protein interactions. 
The endogenous copies of the genes were deleted, so as to reduce crosstalk of 
IRMA with the regulatory networks of the host cell. In order to further isolate 
the synthetic network from its cellular environment, the genes belong to distinct 
and non-redundant pathways. Moreover, they are non-essential, which means 
that they can be knocked out without affecting yeast viability. 

The structure of the IRMA network is shown in Fig. [Ija). The expression 
of the CBFl gene is under the control of the HO promoter, which is posi- 
tively regulated by Swi5 and negatively regulated by Ashl. CBFl encodes the 
transcription factor Cbfl that activates expression of the GAL4 gene from the 
MET16 promoter. The GAL 10 promoter is activated by Gal4, but only in the 
absence of Gal80 or in the presence of galactose which releases the inhibition of 
Gal4 by GalSO. The GALIO promoter controls the expression of SWI5., whose 
product not only activates the above-mentioned HO promoter, but also the 
ASHl promoter controling the expression of both the GALSO and ASHl genes. 

Notice that the network contains both positive and negative feedback loops. 
Negative feedback loops are a necessary condition for the occurrence of oscilla- 
tions [57] , while the addition of positive feedback loops is believed to increase 
the robustness of the oscillations [28]. This suggests that the network structure, 
for suitable parameter values, might be able to function as a synthetic oscillator. 

2.2 Measurements of IRMA dynamics 

The behavior of the network has been monitored in response to two different 
perturbations [S]: shifting cells from glucose to galactose medium (switch-on 
experiments) , and from galactose to glucose medium (switch-off experiments) . 
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(b) 

Figure 1: Synthetic IRM A network in yeast, (a) Schematic representation of the 
network constructed in [S]. The green and blue boxes are promoter and genes, 
and the yellow and red ovals are proteins and metabolites, (b) PADE model 
of IRMA, with state variables x, protein synthesis constants k, decay constants 
7, and thresholds 9. The input variable Ugai refers to the presence of galactose 
{iigai = 0). The subscripts oaU, Swi5, Ashu Cbfi, Gaiso refer to the proteins 
with the same name. 



The terms 'switch-on' ('switch-off') refer to the activation (inhibition) of SWI5 
expression during growth on galactose (glucose). For these two perturbations, 
the temporal evolution of the expression of all the genes in the network was moni- 
tored by qRT-PCR with good time resolution: samples every 10 min (switch-off) 
or 20 min (switch-on) over more than 3 h. 

Fig. HJa) represents the expression of all genes, averaged over 5 (switch- 
on) or 4 (switch-off) independent experiments. In the switch-off experiments 
(galactose to glucose), the transcription of all genes is shut down. In the switch- 
on experiments, a seemingly oscillatory behavior is present with Swi5 peaks at 
40 and 180 min, while Swi5, Cbfl, and Ashl are expressed at moderate to high 
levels [S]. We remark that the confidence intervals are large (not shown), which 
means that the data are essentially semi-quantitative. 

The analysis of the individual time-series reveals that in some cases the gene 
expression profiles are indeed similar, at least qualitatively, whereas in other 
cases notable differences are observed (for example, the oscillatory behavior is 
not present in all switch-on time-series; Fig. [5Jc)). In the latter case, average 
expression levels may be a misleading representation of the network behavior. 

2.3 PADE model of IRMA network 

We built a qualitative model of the IRMA dynamics using PADE models of 
genetic regulatory networks. The PADE models, originally introduced in |16j . 
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Figure 2: Dynamic behavior of the IRMA network in response to medium shift 
perturbations, (a) Temporal profiles of averaged gene expression measured with 
qRT-PCR during switch-off (left) and switch-on (right) experiments (data from 
[S]). (b) Temporal logic encoding of the switch-off and switch-on behaviors. 
The operator EF (p expresses the possibility to reach a future state satisfying 
(f), whereas the operator EX is used to require the existence of an initial 
state satisfying 0. Ugai low and Ugai high denote the absence and presence of 
galactose, respectively. See [5] for more details on the temporal logic CTL. 
(c) Temporal profile of gene expression in an individual switch-on experiment 
showing a switch-off-like behavior. 
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provide a coarse-grained picture of the network dynamics. They have the fol- 
lowing general form: 






f,{x)^yn[b\{x)-^.,x., ze[l,n] (1) 



where a; G i7 C M>o represents a vector of n protein (or RNA) concentra- 
tions. The synthesis rate is composed of a sum of synthesis constants k', each 
modulated by a regulation function h\{x) G {0,1}. A regulation function is 
an algebraic expression of step functions s'^{xj^Oj) or s~{xj,6j) which formal- 
izes the regulatory logic of gene expression. Oj is a so-called threshold for the 
concentration Xj. The step function s^{xj,9j) evaluates to 1 if Xj > 9j, and 
to if Xj < 9j, thus capturing the switch-like character of gene regulation 
{s~{xj, 9j) = 1 — s^{xj, 9j)). The degradation of a gene product is a first-order 
term, with a degradation constant 7.^ that includes contributions of growth di- 
lution and protein degradation. The models can be easily extended to account 
for proteolytic regulators, but we will omit this here as IRMA does not include 
such factors. 

In the case of IRMA, we define five variables that correspond to the total 
protein concentrations of Cbfl, Gal4, GalSO, Ashl, and Swi5, as well as an input 
variable denoting the concentration of galactose. Notice that the measurements 
of the network dynamics concern mRNA and not protein levels. We assume that 
the variations in mRNA and protein levels are the same, even though this may 
not always be the case. A similar approximation is made in (5], where protein 
and mRNA levels are considered to be proportional. 

The FADE model of the IRMA network is shown in Fig. [TJb). Consider 
for example the equation for the protein Gal4. n'cau ^^ ^^^ basal synthesis 
rate, and k'q^u + i^Gai4 its maximal synthesis rate when the GAL4 activator 
Cbfl is present (i.e., xcbfi > f^cbfi)- Swi5 is regulated in a more complex 
way. The expression of its gene is activated by Gal4, but only when GalSO is 
absent or galactose present (which prevents Gal4 inactivation by GalSO), that 
is, only when not both GalSO is present and galactose absent. The step-function 
expression in Fig. [T]Jb) mathematically describes this condition. We remark 
that for the regulation of CBFl, we take into account that Ashl can override 
the effect of Swi5, that is, inhibition dominates activation. Moreover, Swi5 is 
assumed to have three different thresholds, for the regulation of CBFl, GALSO, 
and ASHL 

The FADE model is a direct translation of the IRMA network into a simple 
mathematical format. The model resembles the ODE model in [5], but notably 
approximates the Hill-type kinetic rate laws by step functions. It thus makes 
the implicit assumption that important qualitative dynamical properties of the 
network are intimately connected with the network structure and the regulatory 
logic, independently from the details of the kinetic mechanisms and precise 
parameter values. Several studies have shown this assumption to be valid in 
a number of model systems [SI [TTl [TU], although care should be exercised in 
deciding exactly when modeling approximations are valid |25j . 

To investigate for the possible existence of unknown interactions between the 
synthetic network and the host, we would like to test given the FADE model 
above whether the network structure and the regulatory logic can account for 
the qualitative trends in the gene expression data observed in [S]. Because in 
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some experiments it has been observed that the addition of galactose does not 
always lead to an activation of the IRMA genes, we also search for parameter 
modifications that renders the network response to an addition of galactose more 
robust. 



3 Search of parameter space using symbolic model 
checking 

3.1 Qualitative analysis of PADE models 

The advantage of PADE models is that the qualitative dynamics of high-dimens- 
ional systems are relatively easy to analyze, using only the total order on pa- 
rameter values rather than exact numerical values [5J I12| . The main difficulty 
lies in treating the discontinuities in the right-hand side of the differential equa- 
tions, at the threshold values of the step functions. Following [18], the use of 
differential inclusions based on Filippov solutions has been proposed in [2] and 
implemented in the computer tool GNA |3]. Here, we recast this analysis in a 
form that underlies the symbolic encoding of the dynamics below. 

The key to our reformulation of the qualitative analysis of the PADE dy- 
namics is the extension of step functions 5+ to interval-valued functions S'^ , 
where 

r [0,0] iixj <9j, 

S+{x,,e,)=\ [0,l]ifx, =0„ (2) 

[ [i,i]iix,>e, 

That is, because the step functions are not defined at their thresholds, we 
conservatively assume that they can take any value between and 1 (see [5] 
for a similar idea). When replacing the step functions by their extensions, 
the regulation functions b\{x) become interval- valued functions _B' : M"g ^■ 
{[0, 0], [0, 1], [1, 1]}, and Eq. ([T]) generalizes to the following differential inclu- 



e F,(x) ^ J2 ^'» ^^(^) - ^^ ^^' * ^ [1' "] (3) 



X 



As shown in Section|Bl for most models the solutions of this differential inclusion 
are the same as the solutions of the differential inclusions defined in [2| . 

The starting-point for our qualitative analysis is the introduction of a rect- 
angular partition P of the state space H,. This partition is induced by the 
union of the two sets Qi and A^, i £ [l,?i], where 8i = {Of \ j € Ji} and 
Ai ~ {J2ieB '^\lli I ^ — ^i)- That is, the partition is a rectangular grid defined 
by the threshold parameters 6l and the so-called focal parameters X^zgb ^\hi- 
The focal parameters are steady-state concentrations towards which the PADE 
system locally converges in a monotonic way |16| . For the variable XGai4^ we 

have @Gal4 = {Ocau} and Aca^ = {0, «Ga^/7Ga^, (^^Ca^ + I^Gau)hGau}- 

Interestingly, the partition has the property that in each domain D £ T?, 
the protein production rates are identical: for all x,y G D, it holds that 
B\{x) = B\{y) = B\{D). As a consequence, the derivatives of the concen- 
tration variables have a unique sign pattern: for all x, y G D, it holds that 



RR n° 7284 



Efficient Parameter Search for Qualitative Models of Regulatory Networks 9 



sign{Fi{x)) — sign{Fi{y)) C { — 1,0,1} j^]- Notice that this property is not ob- 
tained for less fine-grained partitions used in related work PQ [H El El [T31 HSl HZ] ■ 
It will be seen to be critical for the search of parametrized models of IRMA that 
satisfy the time-series data. 

The above considerations motivate a discrete abstraction, resulting in a state 
transition graph. In this graph, the states are the domains £> G 2?, and there is 
a transition from a domain D to another domain D', if there exists a solution 
of the differential inclusion in Eq. [3] that starts in D and reaches D' , without 
leaving DUD'. 

D ^ D' m 3^ solution of dSj), 3t e M^o U {oo} 

such that ^(0) £ D, ^(t) e Z?', and Vi G [0, r], ^(t) e D U D' 

The state transition graph defines the qualitative dynamics of the system, in 
the sense that the states give a qualitative description of the state of the system 
(derivative sign patterns, threshold and focal parameters bounding the domain), 
while paths in the state transition graph describe how this state evolves over 
time (changes in derivative patterns, changes in bounds of domain) [2]. 

We reformulate here the transition rules using the interval extensions of 
the regulation functions. The existence of a transition depends on the sign 
of F at the boundary between the two domains. To capture this notion, we 
introduce an interval- valued function Fi : I? x 2? — !■ 2*, where Fi{D,D') = 
E;gl. '*« -S'(-C') - liD'i, for D,D' e V. Fi{D,D') represents the flow in D 
infinitely close to D' . In order to evaluate Fi{D, D'), we use interval arithmetic 
|23| . For instance, in a domain in which xswis > Q'swis ^'^'^ xasM = ^Ashi, 
we have S+ {x s,m5 , dswis) = [1; 1] a-^d S^{xAshi ,OAshi) = [0,1], so that the 
differential inclusion for xcbfi becomes [^J-^f^rj , kJ^j,^j -|- K^hfil ~ Icbfi ■ xcbfi ■ 

We distinguish three types of transitions, depending on whether the tran- 
sition goes from a domain D to itself {D = D' , internal transition), from a 
domain D to another, higher-dimensional domain D' {D C dD', dimension- 
increasing transition), or from a domain D to another, lower-dimensional do- 
main D' (D' C dD, dimension-decreasing transition), where dD denotes the 
boundary of D in its supporting hyperplane. For dimension-increasing transi- 
tion, we obtain the following rule: 

Prop. 1 (Dimension-increasing transition). Let D.D' £ V and D C dD' . 
Z? — > Z?' is a dimension-increasing transition iff 

1. Vi G [l,n], such that Di and D'^ coincide with a value in 0j U A^, it holds 
that OG Fi{D',D), and 

2. Vi G [l,n], such that Di ^ D'^, it holds that 3a > such that a G 
FdD',D){D',~D,) 

The first condition guarantees that solutions can remain in domains located 
in threshold and focal planes, while the second condition expresses that the 
direction of the flow in the domains {Fi{D' , D)) is consistent with their relative 
position (D^ — Di). The rules for other types of transitions and their proofs can 
be found in Section \K\ 

It can be shown that exact parameter values are not needed for the analysis of 
the qualitative dynamics of a FADE model: it is sufhcient to know the ordering 
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of the threshold and focal parameters [2]. This comes from the fact that the sign 
of Fi, and hence the transitions and the state transition graph, are invariant for 
regions of the parameter space defined by a particular total order on 0^ U A^ 
[2] . We call such a total order a parametrization of the PADE model. 

3.2 Search of parameter space: a model-checking approach 

Verifying the compatibility of the network structure with an observed or de- 
sired behavioral property (Section 12. 2p can be achieved by comparing the state 
transition graph with qualitative trends in the data. For large graphs like that 
obtained for IRMA (which has about 50000 states) this becomes quickly impos- 
sible to do by hand. This has motivated the use of model-checking tools (e.g., 
[1] [21 m [13] ) . For PADE models, each state in the graph is described by atomic 
propositions whose truth-value are preserved under the discrete abstraction, 
such as the above-mentioned derivative sign patterns. The atomic propositions 
are used to formulate observed or desired properties in a temporal-logic formula 
(f> and model checkers test if the state transition graph T satisfies the formula 

Because the number of possible parametrizations and the size of state transi- 
tion graphs rapidly grow with the number of genes, the naive approach consist- 
ing in enumerating all parametrizations of a PADE model, and for each of these 
generating the state transition graph and testing whether T 1= 0, is only feasible 
for the simplest networks. We therefore propose an alternative approach, based 
on the symbolic encoding of the above search problem, without explicitly gener- 
ating the possible parametrizations of the PADE models and the corresponding 
state transition graphs. This enables one to exploit the capability of symbolic 
model checkers to efficiently manipulate implicit descriptions of the state and 
parameter space. 

3.3 Symbohc encoding of PADE model and dynamics 

In this section we summarize the main features of the encoding . We particu- 
larly focus on the discretization of the state space, which connects the symbolic 
encoding to the mathematical analysis of PADE models, and the use of the 
discretization for the computation of Fi{D' , D) in Prop.[Tl which is essential for 
state transition computations. 

The symbolic encoding is based on a discretization of the state space im- 
plied by the partition V. We call C a discretization function that maps D € V 
to a set of unique integer coordinates, and C{D) ~ C{Di) x ... x C(Z3„). 
Let rrii be the number of non-zero parameters in Qi U Ai, i £ [l,?^]■ Then 
C(A) € {0,1,..., 2mi + 1}, and more specifically, C(A) e {0, 2, ... , 2mj} if 
Di coincides with a threshold or focal plane, and C{Di) e {1,3,..., 2mi + 1} 
otherwise. More generally, C{S) = {C{D) \ D C S}, for any set of domains 
S. Obviously, C can also be used for the discretization of parameter values. 
For example, in the case of the variable XGai4 , we have one threshold and three 
focal parameters. Now, let D be a domain in the state space and Dcai^ its com- 
ponent in the xgq^ -dimension. Given the following total order on the thresh- 
old and focal parameters, < KQau/lGaU < OcaU < {i^%ai4 + ^^Ga^)hGal4J 
we find C(0) =. (by definition), C{K%^iJ-fGai4) = 2, CiOaau) = 4, and 

C{l^%al4 + I^Gal4)/lGal4) = 6. 
RR n° 7284 
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The above discretization motivates the introduction of symbohc variables Di, 
b[, el, X{ that encode C(A), C{D[), 0(91), C{Xi), respectively. The different 
conditions in Prop. [T]can be expressed in terms of this encoding. For instance, 
the sign of D'^ — Di simply becomes D[ ~ Di. The translation is less evident 
for the encoding of Fi{D',D), the sign of which needs to be computed in the 
transition rules. Multiplication by 1/^i does not change the sign, but gives the 
more convenient expression 

F,{D,D')h, = Y.^n\h,) B\{D)-D', (4) 

leLi 

Recall that the first term in the righthand side is simply an interval whose upper 
and lower bound are focal parameters, determined by the regulation functions 
B\{D). By redefining the step functions in terms of the symbolic variables: 



s+{Dj,e,)^{ [0,1] i^D, = e, (5) 

each B\{D) can be simply computed by means of interval arithmetic. Evaluating 
the expression 'YlieL-i'^ll^i) ^li^) lea-ds to an interval with focal parameters 
as bounds, and which can therefore be represented by A^. From this interval 
we subtract D'^ to symbolically define Fi{D,D')/^i. The sign of the latter ex- 
pression allows one to check the conditions of Prop. [I] and thus to derive the 
transitions in the state transition graph. The specification of transitions in a 
symbolic way is the main stumble block for the efficient encoding of the PADE 
dynamics, especially when D is located on a threshold plane. In our previous 
work [5] , the computation of transitions required the enumeration of an expo- 
nential number of domains surrounding D |T]. The interval-based formulation 
proposed here avoids this inefficient approach and allows Fi{D,D')/^i to be 
computed in one stroke. 

The implementation in a model checker like NuSMV [7] is straightforward 
with the above encoding. In particular, we apply invariant constraints on the 
symbolic variables to exclude all valuations of Di, D[, Of, X^ that do not cor- 
respond to a valid transition from D to D' for a given parameterization. We 
apply three types of invariants. The first one constrains parameters to remain 
constant. The second one constrains D and D' to be neighbors in the state 
space {e.g., D C dD' for dimension-increasing transitions). The final invariant 
constrains the relative position of D and D' and the parameter order as stated 
in the transitions conditions. For comparison with experimental data, we also 
need to know the variations of concentrations of gene products in each state. 
Formally, it is defined as the derivative sign pattern, and simply corresponds to 
the sign of Fi{D, D') as computed above. 

The initial states of our symbolic structure correspond to each possible 
parametrization and transitions towards all states D. A CTL property holds 
for a symbolic structure if all initial states satisfy 0. Therefore, by testing 
whether -i0 holds, we verify the absence of a parametrization satisfying (p. A 
counterexample to -i<j) thus returns a valid parametrization. The current ver- 
sion 8 of GNA [^ has been extended with export functionalities to generate 
the symbolic encoding of PADE models in the NuSMV language. 
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4 Validation: consistency of IRMA network with 
experimental data 

Are the observations of the IRMA dynamics consistent with the network struc- 
ture? At first sight this question may seem incongruous as one expects this to 
be the case by definition (each genetic construct was tested before integration 
in the yeast ceh). However, in practice it is far from trivial, even if the design 
and construction have been carried out with great care, to avoid interactions 
between the synthetic network and the host. 

4.1 Temporal- logic encoding of observations 

To test the consistency between our FADE model of the IRMA network and the 
experimental data, we express that for each condition, switch-on and switch-off, 
there must exist an initial state of the system and a path starting from this state 
along which the gene expression changes correspond to the observed time-series 
data. For example, for the switch-off time-series we encode that there exists an 
initial state where in absence of galactose the expression of SWI5, CBFl, GAL4 
and ASHl decreases (in the interval [0, 10] min), and from which a state can 
be reached where the expression of SWI5 decreases and the expression of CBFl 
increases (in the interval [10,20] min), etc. The generation of this property, 
called 4>i , from the experimental data leads to the temporal- logic formula shown 
in Fig. m^b). The property is automatically generated from the experimental 
data using a Matlab script. 

To disregard small fluctuations due to biological and experimental noise, we 
considered that changes of magnitude less that 5 • lO"*^ units are not significant. 
This smooths out, for example, Gal4 expression levels in switch-off conditions 
after 40 min. In [5] it was demonstrated by glucose-to-glucose shift experiments 
that the mere resuspension of cells into fresh medium has a network-independent 
effect: the expression of GAL80 and GAL4 is strongly increased in the first 10 
min after resuspension. Therefore, we did not incorporate in our specification 
the very first measurements (in the interval [—10,0]) made just before shifting 
cells to a new medium. 

The data presented in [5| for switch-on and switch-off conditions are the av- 
erage of 5 and 4 individual experiments, respectively. As noticed in Section [2. 21 
expression profiles obtained in similar conditions may differ significantly. In 
the case of such heterogeneous behavior, properties capturing the average gene 
expression profile may be misleading. Consequently, asking for consistency be- 
tween our model and the result of each individual experiment might be more 
appropriate. This leads us to define a second property (f>2 similar to 0i but 
requiring the existence of 9 paths in the graph, one for each of the observed 
behaviors in the 5 switch-on and 4 switch-off experiments. Although the in- 
formation we extract from the experimental data is purely qualitative, only 
concerning trends in gene expression levels, the accumulation of these simple 
observations leads to fairly complex constraints. Froperty 02 involves nearly 
160 constraints on derivative signs. 
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4.2 Testing consistency of network with observations 

We use our symbolic encoding of the PADE dynamics and verify the existence 
of a vahd parameter ordering. We do this by testing the negation of (jji or 
02, such that a negative answer from the model checker proves the existence 
of at least one valid parametrization, as explained in Section 13.31 Moreover, 
the counterexample returned provides one such parametrization. By means 
of this approach, we can prove the existence of a parametrization satisfying 
the averaged time-series data (0i ) . The result was obtained in 49 s on a laptop 
(PC, 2.2 Ghz, 1 core, 2 Gb RAM). The state space contains nearly 50000 discrete 
states and the parameter space is discretized into nearly 5000 different parameter 
orderings. The counterexample of -'4>i, obtained in 100 s, provides a valid 
parametrization (Table [1]) . 

When analyzing the corresponding parametrization, the thresholds are mostly 
higher than the focal parameter for basal expression and lower than the focal 
parameter for upregulated expression, e.g., n^^i^J^Ashi < 0Ashi < {i^^Ashi + 
i'iAshi)/lAshi ■ This is not surprising as the focal parameters correspond to the 
lowest and highest possible expression levels. The threshold at which Ashl con- 
trols CBFl expression is expected to lie between the two extremes. The only 
exception in the parameters found by the model checker is Gal80, for which it 
holds {K%ai80 + i^Gai8o)/lGai80 < 0GaWO- According to this constraint, Gal80 
plays no role in the system, since it cannot exceed the threshold concentra- 
tion above which it inhibits Swi5. This is interesting because it suggests that 
the switch-off behavior may occur even without any inhibition by Gal80, and 
consequently, in a galactose-independent manner. 

The dynamic properties of the PADE model can be analyzed in more detail 
by means of GNA. This shows the existence of an asymptotically stable steady 
state corresponding to switch-off conditions, with low Swi5; Gal4. Gbfl, Ashl, 
and GalBO concentrations. In addition, GNA finds strongly connected com- 
ponents (SCCs) consistent with the observed damped oscillations observed in 
galactose media. However, the attractors co-exist irrespectively of the presence 
or absence of galactose, revealing that galactose does not necessarily drive the 
system to a single attractor for this particular parametrization. 

We also tested whether the above parametrization is consistent with time- 
series data from the individual experiments. In 3 s the model checker shows 
that it does not satisfy the more constraining property 02 • However, we do 
find another parametrization for which (f>2 holds. In this case, all thresholds are 
situated between the basal and upregulated focal parameters (237 s, including 
counterexample generation) . 

4.3 Detailed analysis of valid parameter set 

As stated above, our consistency tests only confirm that a parametrization exists 
for which the structure of the network is consistent with the observed behavior. 
However, it does not say if this is trivially the case (when most parametrizations 
are) or if the properties are selective (when most parametrization are not). To 
investigate this we exhaustively generated all possible parametrizations, and 
tested for each of them property 0i (averaged time-series) and 02 (individual 
time-series). Although the total number of parameter orderings (4860) is fairly 
large, the exhaustive analysis is still manageable for networks of this size. 
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Out of the 4860 completely parametrized PADE models, we found that only 
a surprisingly small subset is consistent with the observations. For the averaged 
time-series, only 12 parametrizations are consistent, while for the individual 
time-series this subset is further reduced to 4 parametrizations (Table [Ij. The 
properties extracted from the data are thus seen to be quite selective. 

These results indicate that to be consistent with the experimental data, the 
activation threshold of CBFl by Swi5, 6*5^^5, must be smaller than the activa- 
tion thresholds oi ASHl and GA180 by Swi5, 6*^^^^ and 6*5^^5 • Interestingly, this 
result is corroborated by independent experimental studies. Fitting of experi- 
mental data on promoter activities to Hill functions showed that the activation 
threshold for the ASHl promoter, controlling ASHl and GAL80 expression, is 
nearly twice as high as the one for the HO promoter controlling CBFl expres- 
sion (Table SI of 0). 

A second finding is that the dynamics of the system is consistent with the 
experimental data even if Ooaiso < i^%ai8ol^Gai80i that is when GAL80 is con- 
stitutively expressed above its inhibition threshold. This indicates that an ef- 
fective regulation of GAL80 expression by Swi5 is of little importance for the 
functioning of the network. And indeed, it was found that GAL80 is not much 
responsive to changes in Swi5 availability: Cantone et al observed that a 6-fold 
increase of SWI5 expression leads to only a negligible (1.08-fold) increase in 
GAL80 expression levels (Fig. 4A in [S]). 

5 Re-engineering: improving external control by 
galactose 

As stated above, it has been experimentally observed that the system response 
to an addition of galactose is not always identical. In one experiment at least, 
the addition of galactose does not significantly changes the system's behavior: 
a switch-off like response is observed in switch on conditions (Fig. [U^c)). To 
obtain a more robust external control of the system, we would like to ensure 
that the addition of galactose drives the system out of the low-Swi5 state. 

5.1 Temporal- logic specification of design objective 

We start by specifying that two attractors can be reached, one in switch-off 
conditions, and one in switch-on conditions. In switch-off conditions, the Swi5 
concentration must eventually remain low, that is, equal to its basal expression 
level Kg^jj llswis ■ This is expressed in CTL as AF AG xswis low. In switch-on 
conditions, an oscillatory behavior in the concentration of Swi5 is expected. It 
can be formulated by means of the formula AG AF {xswis inc A AFx swis dec), 
requiring that an increase in xswtb is observed infinitely often and is necessarily 
followed by a decrease in xswis- In addition to these two basic requirements, we 
impose that in presence of galactose, the Swi5 concentration cannot indefinitely 
stay low: Ugai high — > AF -'Xswis low. We prefix these specifications so as to 
express the possibility (EX) to reach the appropriate attractor from some initial 
state, and the necessity (AX) to leave the switch-off steady state for all initial 
states in switch-on conditions. This gives rise to the following property: 
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03 = EX {ugai high A AG AF {xswiS inc A AFxswtS dec)) 
A EX {ugai low A AF AG xswi5 low) 
A AX {ugai high -> AF -^xswts low) 

5.2 Parametrizations consistent with design objective 

Using symbolic model checking, we test the feasibility of (jj^. In about 2min, 
we find that parametrizations exist for which the system presents the desired 
behavior (Table [T]). Using GNA, we can analyze the proposed parameter order- 
ing. In the presence of galactose, several SCCs are found, with two terminal 
SCCs attracting the major part of the state space that includes notably the 
switch-off state: from a switch-off initial state, oscillations necessarily happen. 
In the absence of galactose, a unique stable steady-state where all genes are off 
is attracting the entire state space. Indeed, although SCCs are present, they 
are non-terminal and one can show that the switch-off steady state is eventually 
always reached. 

As explained above one of the time-series in the switch-on conditions con- 
tradicts our specification. It is consequently not surprising that none of the 
parametrizations consistent with the experimental data satisfies our design re- 
quirements, suggesting that changes are needed. We therefore tried to find other 
parametrizations, consistent with 03 . Our method indeed finds an order on the 
threshold and focal parameters satisfying the property (proven in 126 s), while 
the enumeration of all 4860 parametrizations shows that only 7 are valid (1300 s; 
Table [T]). 

A first surprising feature is that Og^^g < k^^j^ /jswis ■ Swi5 must always 
activate CBFl. Stated differently, this constraint simply suggests to remove the 
regulation of CBFl by Swi5. This can be explained by a qualitative analysis 
of the system dynamics. In the presence of galactose, we expect oscillations 
for Swi5. However, the presence of Swi5 is required for the expression of CBFl 
since the HO promoter functions like an AND gate: HO is on if and only if Swi5 
is present and Ashl is absent. So, if Swi5 is not permanently present, Cbfl and 
then Gal4 might diseappear, causing the system to converge to the switch-off 
state. 

A second surprising feature is that the regulation of GAL80 by Swi5 should 
not be effective. Indeed 0^^^^ < K%^^^/-fswi5 or Bcaiao < i^^amhcawo means 
that either the GAL80 promoter is always activated, or that the Gal80 concen- 
tration is always sufficient to repress SWI5. As above, this suggests to remove 
an interaction, namely the regulation of CAL80 by Swi5. Interestingly, the de- 
mand for increased external control of the system leads us to a simplified design 
in which two out of the three feedback loops are removed. 

6 Discussion 

We proposed a method for efficient search of the parameter space of qualita- 
tive models of genetic regulatory networks. This allows us to test whether a 
hypothesized structure of the network is consistent with the observed behavior, 
or whether a proposed structure can generate a desired behavior. 

On the methodological side, the main novelty is that we develop a sym- 
bolic encoding of the dynamics of PADE models, enabling the use of highly 



RR n° 7284 



Efficient Parameter Search for Qualitative Models of Regulatory Networks 17 



efficient modei-checking toois for analyzing incompletely parametrized models. 
The symbolic encoding avoids the explicit generation of the state space, and 
the enumeration of possible parametrizations. Although developed for PADE 
models, the main ideas underlying the approach carry over to logical models |27j . 

On the biological side, we show the practical relevance of our approach by 
means of an application to the IRMA network. The parameter constraints we 
obtained are precise, have a clear biological interpretation, and are consistent 
with independent experimental observations. Even when considering complex 
dynamical properties, the search of the parameter space takes at most a few 
minutes. Our results seem to confirm the intended separation of IRMA from 
the host network, and suggest that to obtain a more robust response to the 
addition of galactose, an effective rewiring of the network would be needed. 

In comparison with traditional quantitative modeling approaches, the results 
we obtain are quite general, since they do not depend on a specific representa- 
tion of the molecular details of the interactions and on specific parameter values. 
Moreover, the analysis is exhaustive in the sense that the entire parameter space 
is scanned. These two features are particularly interesting for negative results, 
such as showing that a given design is not likely to present a desired behavior. In 
contrast, quantitative ODE models like those developed in |5] do not predict a 
range of possible behaviors but rather single out one likely behavior with quan- 
titative traits. Qualitative and quantitative approaches provide complementary 
information on system dynamics. 

In comparison with other analysis and verification methods developed for 
similar modeling formalisms [TJ HI [HI I15| , our approach is original in two re- 
spects. First, it applies to incompletely parametrized models and can handle 
any dynamical property of the network expressible in the temporal logic sup- 
ported by the model checker. Second, we reason at a finer abstraction level, in 
that we take into account dynamics on the thresholds and work with a parti- 
tion of the state space preserving derivative sign patterns. The latter feature is 
particularly well-suited for the comparison of model predictions with time-series 
data in IRMA. 

An interesting direction for further research would be to consider even more 
general problems in which not only parameters but also regulation functions are 
incompletely specified. This would make a connection with work on the reverse 
engineering of Boolean models {e.g., [TOll^l^ . 
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Appendix 

A Transition rules 

Prop. 2 (Internal transition). Let D,D' e V and D = D' . D ^ D' is a 
internal transition iff 

Vi e [!,"-], such that Di coincides with a value in Qi U A^, it holds that 
OeF,iD,D) 

Sketch of the proof. 

(Necessity): If for some i G [l,7^], Di coincides with a value in Qi U A^ and 
^ F,{D,D), then because Vx £ D, F,{x) = F,{D,D), any solution ^ of dS]) 
starting in D satisfies ^i(O) ^ 0, and consequently instantaneously leaves D. 
(Sufficiency): We only need to show that for some x G D, there exists a solution 
^ of (j31) that remains in D for some (possibly small) time interval [0,r]. Let 
xq be any point in D. For all dimensions i where Di coincides with a value in 
0; U Ai, choose ^i{t) — xq. for i > 0. For all other dimensions i, choose any 
solution of the differential inclusion Xi S Fi{x). Then for a sufficiently small 
T > 0, ^(<) = (^i(t), . . . £,n{t)) remains in D and is a solution of Q on [0, r]. D 

Prop. 3 (Dimension-increasing transition). Let D,D' G V and D C dD'. 
£) — > D' is a dimension-increasing transition iff 

1. Vi £ [l,n], such that Di and D'^ coincide with a value in QiU Ai, it holds 
that eF,{D',D), and 

2. Vi G [!,«.], such that _Di ^ D'^, it holds that 3a > such that a G 
F,(i?',I?)(i?^-A) 

Sketch of the proof. 

(Necessity): Condition 1 expresses that if Di and D'^ are singletons, then any 
solution ^ remaining in D' should satisfy ^i{t) = 0. The proof is made as in 
Proposition [5] For condition 2, assume that for some i such that Di ^ D'^, 
all a G Fi{D' , D) {D[ — Di) are non-positive. Moreover, assume without loss of 
generality that D'i — Di > 0. Then denoting Aj and Ai the bounds of the interval 
Ei6L.(«'/7i) Bl{D'), we have for any x G D, Fi{D',D)/-f, = [A,, A,] - x, < 0. 
We deduce that for all x' G D' , Fi{x')/ji = [Aj,Ai] — .t- < 0. Consequently, 
given the relative positions of D and D', no solution can enter D' from D. 

(SufHciency): We show that there exists a solution ^ of ([3]) that starts in 
some Xq Cz D (^(0) = xq) and enters and remains in D' for some (possibly 
small) time interval {£,{t) G D',t g]0,t]). Let xq be any point in D. For all 
dimensions i G [!,"-] where Di and D'^ are singletons, choose ^i{t) ~ xq. for 
t > 0. For all dimensions i G [l,n] where D'^ ~ Di > (the case D'^ — Di < 
being symmetrical), Di is a singleton, and for any x £ D, max Fi(D' ,D)/ji — 
Xi ~ Xi > implies that max_F!i(a;') = 7i(Ai — x'i) > for all x' in a (possibly 
small) neighborhood of a; in D' . Then choose for ^i the solution of the differential 
equation Xi = %{Xi — Xi) with ^i(O) = aiQ;. For all other dimensions, choose 
any solution of the differential inclusion Xi G Fi (x) , with ^i (0) = xq. . Then for 
a sufficiently small r > 0, ^(t) = ('^i(i), . . ■£,n{t)) starts in D, remains in D' on 
]0, t] and is a solution of ^ on [0, r]. D 
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Prop. 4 (Dimension-decreasing transition). Let D,D' G V and D' C dD. 
D — > D' is a dimension-decreasing transition iff 

A) 1. Vi G [l,n], such that Di and D'^ coincide with a value in 0^ U A^, it 

holds that G F^{D, D'), and 

2. Vi G [l,n], such that D^ 7^ D'^, it holds that 3a > such that 
aeF,iD,D'){D[-D,) 

B) orVi G [l,n], Oe F,{D,D') 

Sketch of the proof. 

(Necessity): Assume that Al does not hold. Then B does not hold either. This 
means that for some i G [l,n] such that Di and D[ coincide with a value in 
Qi U Ai, it holds that ^ Fi{D, D'). Then it can be shown as in the proof of 
Prop. [5] that no solution can remain in D. Now, assume that neither A2 nor B 
holds. As in the proof of Prop. [3]and using the same notations, we can show that 
not A2 implies that if D[ — Di > 0, then Ai — x^ < 0. If Ai — x^ < 0, there cannot 
be transitions from D to D' . If Ai — x'i ~ 0, one can show that the solutions of 
the differential equation it = mayi Fi{x) reach Ai = x'i after an infinite amount 
of time. Then, obviously no solution of ([3]) can reach D'^ = {x'i} in lesser time. 
But then, the asymptotic convergence towards some point x' G D' implies that 
for all i G [l,n], G liTa^^x' Fi{x), and hence G Fi{D,D'). Indeed, if for 
some i G [1,?^] and e > 0, Fi{D,D') > e (or equivalently if Fi{D,D') < e), any 
solution would leave in finite time any neighborhood in D of any point x' G D'. 
(Sufficiency): Assume that conditions Al and A2 hold. Then one can con- 
struct as in the proof of Prop [3] a solution ^ that starts and remains in D some 
time interval [0,r[ (^(t) e D,t e [0, r[) and enters in D' at time r (^(r) G D'). 
Alternatively, assume that condition B holds. Then, Vz G [l,n], G Fi{D,D') 
implies that for some x* G D', G X]/ei ('^i/T*) -^i(^) ~ ^i' * ^ [^,n]. Let xq 
be any point in D and ^i(i) be the solution of Xi = ji{x* ~ Xi) on [0, cx)[ with 
^i(O) = xoi, i G [1,«]. One can check that S, = (^1, . . . ,^n) is a solution of ^ 
such that Vi > 0, £,{t) G D, and limt^oo^{t) ^ x* e D' . D 

B Comparison with previous definition of dynam- 
ics 

In [2], we introduced a different definition of the dynamics in regions of step 
function discontinuities -that is, threshold hyperplanes- also based on differential 
inclusions. The goal of this section is to show that in most cases the differential 
inclusions, and hence the set of solutions, are the same for both definitions. 

The definition proposed in [5] makes use of the notions of regular and singular 
domains. Intuitively speaking, singular domains are located on threshold or 
focal planes, contrary to regular domains. Moreover, one defines for any singular 
domain D, R{D) as the set of regular domains surrounding D. We refer to [5] 
for the precise definition of these notions. 

For simplicity of notations, we set R{D) = D ioi a. regular domain D. Also, 
for having more compact notations, we introduce gi{D,x) as the value of / in 
D infinitely close to x: gi{D,x) = liniy^x.yeD fi{y)- Naturally, gi{D,x) is well 
defined only if x is in D, the closure of D, and gi{D, x) = fi{x) ii x G D. Then 
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in [2], the dynamics is defined as 

Xi £ GAx) — \ min oAD'x), max gAD'x)] (6) 

D'eR{D) D'eR{D) 

The definition we propose in this paper is more conservative than the one given 
in [2]. 

Prop. 5. 

G^{x)cF,{x), xen 



Sketch of the proof. 

As was done for gi{D, x), we define s'^lD, Xj, dj) and b\{D, x) as liuiy^x.yeD s^iUj, ^j) 

and limy^x.yeD b\{y), respectively. For any x G D, D (Z ft, one can easily show 

that 

VL»' e RiD), s+{D',x.j,9j) G S+{xj,ej) 

If Dj ^ {Oj}, then this obviously holds, as 5+(a-j, fl^) = [0, 0] (or [1,1]) and for 
all D' G R^{D), s+{D',Xj,ej) = (or 1). If D^ = {0j}, then S+{xj,0j) = [0,1], 
and necessarily, s'^{D' ,Xj,6j) G S^{xj,9j) for all D' G R{D). Then, from 
interval arithmetics, we have 

yD' eR{D), h\{D',x)&B\{x) 

and finally 

yO' G R{D), g^{D',x) G F^{x) 

From the definition of Gi{x), we conclude that Gi{x) C Fi{x). D 

In most cases, the inclusion of Prop\^is an equality. In fact strict inclusion 
can arise only in two cases. The first one occurs when a protein has a dual role 
(activator and inhibitor) with a same activity threshold on a single promoter. 
For example, this is the case if some b\ term equals s^{xj, 9j) ■ (1 — s^{xj, Oj)). 
Indeed, for Xj = 9j, it holds that Bl{x) = [0, 1] whereas max£)/g/j(£3) h\{D' ,x) = 
0. The second case occurs when a protein has a dual role with a same activity 
threshold on two different promoters of a gene. For example, this is the case 
if fi{x) equals k]s^{xj,0j) + k|(1 — s~^(xj,9j)) — jiXi. Indeed, for Xj = Oj it 
holds that Fi{x) — [0, n\ + k?] — ^iXi, whereas Gi{x) = [k\, k|] — 7iXi, assuming 
k\ < nf . These rare cases appear in none of the FADE models developed so far 
and distributed with GNA, and nught be considered as modeling problems. 
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